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Abstract 

The Brier Score is a widely-used criterion to as- 
sess the quality of probabilistic predictions of bi- 
nary events. The expectation value of the Brier 
Score can be decomposed into the sum of three 
components called reliability, resolution, and un- 
certainty which characterize different forecast at- 
tributes. Given a dataset of forecast probabilities 
and corresponding binary verifications, these three 
components can be estimated empirically. Here, 
propagation of uncertainty is used to derive expres- 
sions that approximate the variances of these esti- 
mators. Variance estimates are provided for both 
the traditional estimators, as well as for refined esti- 
mators that include a bias correction. Applications 
of the derived variance estimates to artificial data 
illustrate their validity, and application to a mete- 
orological prediction problem illustrates a possible 
use case. The observed increase of variance of the 
bias-corrected estimators is discussed. 



1 Introduction 

The basis of the following discussion is a data set of 
forecast probabilities {p n }n=ii an( l corresponding 
verifications {y n }n=i- We assume a binary predic- 
tion setting, that is, the verification at instance n, 
y n , is either one if the event happens, or zero if it 
does not happen. The forecast probability p n is a 
probabilistic predictio n for the eve nt y n = 1. The 
empirical Brier Score (|Brierl . ll950h assigned to the 
set of forecasts {p n } is given by 



Br 



1 N 



(i) 



The Brier Score is negatively oriented, assigning 
lower values to better forecasts. The Brier Score 
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further has the property of being proper, which 
means that a forecaster cannot improve his expected 
Brier Score by issuing forecasts q that differ from his 
best estimates p of the actual event probabilities. In 
fact, any such deviance from p will increase his ex- 
pected Brier Score, which makes the Brier Score a 
strict ly proper scoring rule ( DeGroot and Fienberd . 
19831) . 



It has been shown bv lMurphvl (|1973l ) that the Brier 
Score can be decomposed additively into three non- 
negative terms, called reliability, resolution, and 
uncertainty: 

Br = REL - RES + UNC (2) 

A qualitative interpretation of the individual com- 
ponents is given next; mathematical details follow 
below. The reliability term quantifies how far the 
forecast probabilities p n differ from the correspond- 
ing conditional event probabilities P(jm = 1 \ p n )- 
Ideally, it should always hold that p„ = P(y„ = I 
p n ); in this case the reliability component vanishes. 
A systematic difference between the two terms is 
penalized by a positive reliability component. The 
resolution component rewards variations of the fore- 
cast probabilities that are consistent with varying 
event probabilities. A forecasting scheme that con- 
stantly issues the same probabilities has zero res- 
olution. Any meaningful variability of the fore- 
cast leads to a positive resolution term which im- 
proves the Brier Score. The uncertainty component 
is equal to the Brier Score of the average (clima- 
tological) probability. It thus serves as a bench- 
mark to which the Brier Score of the forecast under 
consideration can be compared. A 'useful' forecast 
should have a Brier Score that is higher than its 
uncertainty component, or in other words, the res- 
olution should be larger than the reliability. 

Consider the forecast probability p and the corre- 
sponding verification y as two (dependent) random 
quantities. Then the calibration function tt{p) and 
the climatology 7f are defined as 

n{p) = F(y = 1 | p), and (3) 

7f = P(j/=l). (4) 

Using these definitions, the three components of the 



1 



Brier Score decomposition are formally given by 



where p G pfj . The climatology is estimated by 



REL* = E[p - tv(p)] , (5) 
RES* = E [ir(p) - 7f] 2 , and (6) 
UNC* =7f(l-7f), (7) 

where E denote s the mathematical expectation value 
(|Brockerll2009l ). The star (*) is used to differentiate 
the exact analytical expressions from their empiri- 
cal estimators, which are discussed below. 

In practice, the three components of the Brier Score 
decomposition must be estimated empirically from 
the set of forecast probabilities and corresponding 
ve rification s \p n ,jl n\- Such estimators are derived 
in iMurphvl ( 19731 ): they are presented below in a 



somewhat different notation, which is suitable for 
variance estimation by propagation of uncertainty 
(see Sec. 

First of all, the observed forecast probabilities {p n } 
are binned into D mutually exclusive and collec- 
tively exhaustive binspy, where d = 1, • ■ ■ , D. Here, 
bins of equal width which are half-open to the left 
are used, except the first bin which is closed (but 
the theory also applies to variable bin widths). As 
an example, if D = 3 we would have {p[j}| = i = 
{[0,1/3], (1/3, 2/3], (2/3,1]}. Using this binning of 
the forecast probabilities, the following matrices are 
defined: 

A e {0,1} 



C e [0, l] 
Ye {OA} 



where I(-) denotes the indicator function. Summa- 
tion over a column or row of a matrix is abbreviated 
by a bullet (.), for example 



NxD 


A n d 


= I(Pn 




(8) 


NxD 


B n d 


= HPn 


e Pu) y n , 


(9) 


NxD 


C n d 


= HPn 


G Pn) Pn, 


(10) 


\JVxl 


Yn = 


-- y n , 




(11) 



A. d 



N 

£ 

71=1 



-4, 



(12) 



A bullet without a second index always refers to the 
row vector of column sums of a matrix, as in 



A. = 1 A 



(13) 



where 1 is the Nxl column vector with all elements 
equal to one. 

Using these definitions, A.d is equal to the total 
number of cases where p n 6 p u ■ B.d is equal to the 
number of cases where p n S pfj and at the same 
time y n = 1. Therefore, a binned estimator for the 
calibration function is given by 



N ' 



(15) 



Furthermore, C.d/A,d is equal to the average fore- 
cast probability in the <i-th bin. Y m is equal to the 
total number of events that have occurred. Lastly, 
note that B„ = Y., and A., = N. 

Using this notation, the estimators for the three 
components of the Brier Sc o re dec omposition orig- 
inally proposed by iMurphv ( 1973 ) are given by 



REL = REL(A.,B.,C.) 

RES = KES{A.,B.,Y.) 

" N J- A ' d U.d N 



UNC = UNC(V.) 

Y.(N-Y.) 

N 2 



(16) 



(17) 



(18) 



where D — {d : A.d > 0}. In the following we 
refer to REL, RES, and UNC as the traditional es- 
timators of the components of Brier Score decom- 
position. 

In Ferro and Fricker ( 2012f ) it is shown that the 
traditional estimators are biased. They show that 
the bias can be corrected to some extent, although 
never perfectly elimina ted. Using our notation, the 
estimators proposed bv lFerro and Frickerl (|2012i ) are 
given by 



REL'(^.,B.,C.) 

-T 



REL - 



de~Bi 



B.djA.d — B,d) 
A.d{A. d - 1) 



(19) 



RES'(A.,B.,Y.) 



res-1 y 



B.djA.d — B.d) 
A.d(A.d - 1) 



Y.(N-Y.) 

' N 2 (N- 1) : 



(20) 



and 



UNC'(Y.) = UNC 



Y.(N - Y.) 



N 2 {N - 1) 

r.(jv-y.) 

N(N- 1) ' 



(21) 



tt(p) « 7Td := B.d/A.a 



(14) 



where Di = {d : A d > 1}. We refer to REL', RES', 
and UNC' as the bias- corrected estimators. 
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Due to the analytical expressions Eq. (|5|) - Eq. , 
it holds that REL* G [0,1], RES* G [0,1] and 
UNC* G [0,0.25]. One could argue that estima- 
tors for the individual components should be con- 
fined to these intervals as well. While the tradi- 
tional estimators always satisfy t his restriction, the 



bias-corrected estimators do not. Ferro and Fricker 



(|2012T ) acknowledge the possibilities REL' < and 
RES' < 0, and recommend a suitable modifica- 
tion to their bias correction. Unfortunately, this 
modification does not account for the possibilities 
UNC' > 0.25 and RES' > 1. In Appendix M a 
modification of the bias-corrected estimators is sug- 
gested which avoids all possible inconsistencies. 

A note on terminology: In order to limit confusion 
due to repeated use of the word estimate, we shall 
always use the term estimator to refer to the com- 
ponents of the Brier Score decomposition estimated 
by Eq. (|16j) - Eq. (|21|l . and the term variance esti- 
mates to refer to the approximated variance of these 
components. 

In Sec.[5]of this article it is shown how propagation 
of uncertainty can be applied to calculate variance 
estimates for the estimators of a Brier Score decom- 
position. The variance estimates are validated in an 
artificial prediction setting in Sec. [3j Application to 
a meteorological prediction problem in Sec. |4] illus- 
trates a possible use case. In Sec.[5]the simplifying 
assumptions, validity of the new variance estimates, 
and variance increase of the bias-corrected estima- 
tors are discussed. Section [6] concludes the article. 
The article is complemented with Supplementary 
Online Material which includes source code written 



in th e R programming environment (|R Core Team , 
20121 ) to reproduc e all calculations. A library for 
the R environment (Sie gert and R Core Teaml . l2013f ) 
is available to apply the results of this study in prac- 
tice. 



2 Variance estimation by prop- 
agation of uncertainty 

The general setting is now that we have scalar es- 
timators F for the components of a Brier Score de- 
composition, which depend nonlinearly on the col- 
umn sums x of a matrix X: 



F(X.) =: F(x). 

For example if F = REL we have 

X = [A\B\C] e R Nx3D 
x = 1 T X = [A.\B.\C.] e R 



lx3D 



(22) 

(23) 
(24) 



It is possible to apply p ropagation of uncertainty 
(e. g. iMood etail 



1974) to estimate the variance 



of F(x) as a function of the covariances of its ar- 
guments. The first-order Taylor expansion of F 
around x (the expectation value of x) is given by 



F(x) » F(x) 



flf (x) 

~thT 



(x 



(25) 



where <9F(x)/<9x is shorthand for the Jacobian of 
.F(x) evaluated at x. Under this approximation, 
the variance of F(x) is given by 



V[F(x)] = E[F(x) - EF(x)] 2 

-^Cov(x)^ 



(26) 
(27) 



where Cov(x) = E[(x — x) T (x — x)]. Recall that 
the i-th element of x is the sum over Xr^ , the i-th 
column of X. Under the assumption that the rows 
of X are iid, it can be shown that 



Cov(x) « X q 



I — —ll 1 

N 



X. 



(28) 



using the fact that Cov(xj, Xj) = NCov(X^,X^), 
and estimating the latter by the sample covariance. 

Equation (f2"T)) combined with Eq. (f2"5|) provides a 
recipe to estimate the variances of the estimators 
REL, RES, and UNC, as well as their bias-corrected 
counterparts. All data that is necessary to estimate 
the variances has already been calculated for the 
estimators themselves. The only tedious bit is the 
calculation of the derivatives of the estimators with 
respect to the individual column sums for the Ja- 
cobian. These derivatives are given in Appendix |A1 



3 Application to artificial data 

In order to illustrate their validity, we apply the 
variance estimates to Brier Score decomposition in 
an artificial prediction setting, for which the compo- 
nents of the decomposition are known analytically. 
The results are discussed in Sec. [5] The code to re- 
produce the numerical computations of this article 
is available in the Supplementary Online Material. 

In the artificial example, we assume that the event 
y G {0, 1} is an independent realization of a Bernoulli 
trial with success probability q. If y = 1, we say 
that 'the event occurs'. In our example, the event 
probability q is itself a random variable that is equally 
likely to assume one of 6 possible values, namely 
1 G {ld} e d =i = {0-05, 0.15, • • • , 0.55}. A forecasting 
scheme for the event y which has nonzero resolu- 
tion and nonzero reliability is constructed as fol- 
lows: The forecast probability p corresponds to the 
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actual event probability q whenever q 0.55. But 
whenever the event probability q = 0.55, the fore- 
cast probability is equal to p = 1. That is, p £ 
{Pd}d=i = {?!)•• • 7 95, 1}, with equal probability of 



For the above scheme, the climatological probabil- 
ity is equal to 



1 6 



3 

To' 



(29) 



The true uncertainty of this forecasting scheme is 
thus given by 



UNC* = 7f (1 — 7f) 



21 

Too' 



(30) 



Furthermore, since the calibration function in this 
setting is given by 



n(pd) = Qd, 



(31) 



the true reliability component of the Brier Score of 
the forecast p is calculated as 



1 6 

REL * = ?E(^- 



d=l 



27 
800' 



(32) 



and the true resolution of the forecast is given by 



and UNC' are calculated for this data, together 
with their corresponding variance estimates derived 
in Sec. [2] This whole experiment is repeated 100 
times, each time with a new realization of forecast 
probabilities p n and corresponding event indicators 

Un- 

The results of these 100 trials are illustrated in 
Fig.ffl For each trial, the traditional (left) and bias- 
corrected (right) estimators for reliability, resolu- 
tion, and uncertainty are shown, augmented with 
error bars with a half width of two estimated stan- 
dard deviations. 

In Table [TJ the outcome of the experiment is fur- 
ther quantified by statistical summary measures. 
To make the calculation of these summary mea- 
sures precise, consider as an example the estima- 
tor REL. Define REL = i|jo REL i> where 
REL.; is the estimator REL obtained on the z-th 
trial. The sample variance (first column) was calcu- 
lated by ^ J2i=i (RELj - REL) 2 , the average esti- 
mated variance (second column) was calculated by 
Ygfj J2i=i VRELi, the average squared error (third 
column) was calculated by ^^(REL^— REL*) 2 , 
and the average bias (fourth column) was calculated 
by Ygp y*'j— i (REL; — REL*). Summary measures 
for the other components were calculated accord- 
ingly. 



RES* 



1 6 



d=l 



7 

240' 



(33) 



Note that in this example REL* > RES*, and there- 
fore the forecast is 'useless' in the sense that the 
constant climatological probability w achieves a bet- 
ter Brier Score (which is equal to UNC*) than the 
forecast probability p. 

A single numerical experiment consists of N = 250 
forecast probabilities p n , and corresponding event 
indicators y n , independently sampled as outlined 
above. Each such experiment results in a data set of 
forecasts and verifications {pn,Un}n=i an d a Brier 
Score decomposition is estimated for this data set. 
For the empirical decomposition, we bin the fore- 
cast probabilities into 10 equally large non-overlapping 
bins. Under this binning, in-bin-averages are ex- 
actly equal to the actual forecast probabilities, as 
the chosen binning is somewhat 'natural' in this 
forecast scenario. For infinitely many forecast in- 
stances the estimators would thus converge to the 
true components, without further discrepancies in- 
troduced by the binning. In our example, the first 5 
bins and the 10-th bin are each occupied with prob- 
ability i, and the others are never occupied. The 
resulting estimators REL, RES, and UNC, as well 
as their bias-corrected counterparts RES', REL', 



4 Meteorological application 

We apply Brier Score decomposition to real fore- 
cast data and use the variance estimates to quan- 
tify the variability of the components of the decom- 
position. We use daily maximum temperature ob- 
servations measured at Dresden/ Germany (WMO 
no. 10488) between 1980/01/ 01 and 1999/12/31 
( Deutscher Wetterdienst . 20121) . Our (binary) pre- 
diction target is the exceedance of a certain thresh- 
old one day in the future. 

The data between 1980/01/01 and 1989/12/31 is 
used as training data. Denote this data by T' n , 
where n is an integer that indicates 'days since 
1970/01/01'. We omit the unit of T' n and remem- 
ber that it is measured in °C. We obtain the sea- 
sonal cycle c„ by fitting a second order trigonomet- 
ric polynomial to the observations: 

c n = Po + Pi cos(wn) + /?2 sin(o;n) 

+ p 3 cos(2w„) + p 4 sin(2wn), (34) 

where uj = 27r/(365.2425 days) and the coefficients 
were fitted by minimizing the sum of squared dif- 
ferences between c„ and T' n using ordinary linear 
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REL 



REL' 






Figure 1: Illustration of the experiment with artificial data. For each trial of the experiment, the traditional 
and bias-corrected estimators of the Brier Score components are shown, augmented with error bars with a 
half-width of 2 estimated standard deviations. The bold black line indicates the true value. 





sample variance 


avg. est. variance avg 


squared error 


avg. bias 


REL 


1.540 x 1CT 4 


1.665 x 10" 4 


1.641 x 10 


-4 


3.182 x 10" 


-3 


REL' 


1.548 x 10" 4 


1.687 x 10~ 4 


1.563 x 10 


-4 


-1.184 x 10" 


-3 


RES 


7.062 x 10~ 5 


8.521 x 10" 5 


8.093 x 10 


-5 


32.101 x 10- 


-4 


RES' 


7.220 x 10" 5 


8.746 x 10" 5 


7.230 x 10 


-5 


-3.155 x 10- 


-4 


UNC 


1.561 x 10~ 4 


1.336 x 10~ 4 


1.565 x 10 


-4 


-6.195 x 10- 


-4 


UNC' 


1.573 x 10~ 4 


1.347 x 10~ 4 


1.574 x 10 


-4 


2.214 x 10- 


-4 



Table 1: Summary of the artificial example. All averages are taken over the 100 trials of Fig. [TJ The first 
column shows the sample variance of the estimators. The second column shows the average of the estimated 
variances. The third column shows the average squared difference between the estimator and the true value. 
The fourth column shows the average bias, that is the average difference between the estimated value and 
the true value. 
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regression. For the data at hand, we obtain 



{j8 , ■•-,&} = {13.2, -10.7, -3.1, -0.6, 0.03} 



over the training period. Using the seasonal cycle, 
the anomalies T„ are defined by 



T — T' 



(35) 



Next, a first-order autoregressive model is fitted to 
the anomalies, usin g the R function ar p rovided by 



the stats package (|R Core Team! . I2012f ). That is, 
the temperature anomaly T„ + i, conditional on the 
anomaly T n is modeled by 



T n +i = aT n + ae 7 , 



(36) 



where a is the AR parameter which quantifies the 
serial dependence of successive temperature anoma- 
lies, er 2 is the variance of the residuals, and e n is 
a realization of Gaussian white noise. We obtain 
a = 0.77 and a = 2.97 in the training data. 

Our prediction target is whether the temperature 
anomaly at time n exceeds a threshold r°C on the 
next day, that is y n = I(T„ > r). Using the au- 
toregressive model, we produce a probabilistic 24h 
exceedance forecast using the formula 

Pn = P(T„ > t | r„_ x = t) = 1 - $ QtiCT (r), (37) 

where $^,0- (x) is the cumulative Gaussian distribu- 
tion function with mean /1 and variance er 2 , eval- 
uated at x. Using Eq. (|37p and the parameters 
obtained from the training data, daily forecasts are 
produced for the time between 1990/01/01 and 1999/ 
The forecast probabilities p n for the targets y n are 
analyzed by decomposition of the Brier Score. 

The result of the analysis is presented in Table [5] 
for the choice of the threshold r = 5. Estimators 
of the three components REL, RES, and UNC, in 
the traditional and the bias-corrected version are 
given in the first row. Using these estimated com- 
ponents, we get REL - RES + UNC = 0.0875. The 
empirical Brier Score calculated by Eq. (JXJ) is equal 
to Br = 0.0868. In the second row of Tabled the 
corresponding variance estimates are shown. 

In Fig. [21 the bias-corrected components of the au- 
toregressive exceedance forecast and the empirical 
Brier Score are shown as functions of the threshold. 
The error bars of half widths two standard devia- 
tions provide an estimate of the sampling variability 
of the components. 




~i 1 1 1 1 r 

3 6 9 12 15 
threshold [C] 



o 
o 

CO 



12/31. 




"i r 
6 9 12 

threshold [C] 
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Figure 2: Brier Score decomposition of the temper- 
ature anomaly exceedance forecasts by an autore- 
gressive model. Upper panel: REL' and RES' as 
a function of the threshold which defines the ex- 
ceedance event, augmented with errorbars of half 
width two estimated standard deviations. Please 
note the different y-scales for REL' and RES'. 
Lower panel: Same as above for UNC' and Br. In 
this plot the y-scale is the same for both quantities. 



5 Discussion 



The assumptions and simplifications that entered 
the derivation of the variance estimates must be 





REL 


RES 


UNC 


REL' 


RES' 


UNC 


estimate 


9.060 x 10" 4 


0.0542 


0.1408 


4.130 x 10" 4 


0.0537 


0.1408 


variance 


2.096 x 10~ 7 


1.157 x 10~ 5 


1.684 x 10" 5 


2.173 x 10~ 7 


1.163 x 10~ 5 


1.685 x 10~ 5 



Table 2: Summary of Brier Score decomposition of 10 years' worth of temperature anomaly exceedance 
forecasts (1 day lead time, threshold 5°C) by an autoregressive model. 



discussed. The first simplification of the problem 
was the first order Taylor expansion in Eq. |)25|) . 
Its validity relies on the assumption, that the differ- 
ence between the observed values of the arguments 
and their expectation values is small enough that 
quadratic terms can be ignored. This need not be 
the case, especially if the number N of forecasts 
and verifications is small. To estimate the covari- 
ance matrix by Eq. (|28|). we make implicit use of 
the assumption that the pairs of forecast probabil- 
ities and event indicators {p n ,Un} are independent 
for different n. This assumption might not hold in 
meteorological applications because the probability 
of rain on day n + 1, for example, is often similar 
to the probability of rain on day n. In the light of 
the above criticism we should expect that more ac- 
curate variance estimates than the ones presented 
here ought to exist. Nonetheless, Fig. Q] suggests 
that we obtain reasonable variance estimates de- 
spite all the simplifying assumptions. 

Figure Q] further suggests that the two-standard de- 
viations confidence intervals cover the true value 
with probability of around 95%, which is the cor- 
rect value assuming Gaussianity and unbiasedness 
of the estimators. For non-Gaussian and biased 
estimators, coverage probability is not a suitable 
criterion. In the artificial example the biases are 
about one order of magnitude smaller than the over- 
all variability of the estimators, and the variations 
of the estimators appear symmetric around their 
mean and without large deviations. Unbiasedness 
and Gaussianity thus seem to be good first approxi- 
mations to the statistical behavior of the data. Ad- 
equate coverage frequency is thus taken as evidence 
for the quality of the variance estimates. 

Table [T] illustrates the decrease of the bi a ses b y 
the estimators derived bv lFerro and Frickerl (|2012T ). 
The magnitude of the average difference between 
the estimator and the true values is substantially 
lower for the bias-corrected estimators than for the 
traditional estimators. At the same time, however, 
the variances (both estimated and sampled) of these 
bias-corrected estimators are slightly larger than 
the variances of the traditional estimators. This is 
an example of the bias-variance tradeoff, regularly 
encountered in statistical estimation problems (e. g. 
Eldarl . l2008t ). In fact, Table □shows that the reduc- 
tion of the bias in the uncertainty, which comes at 



the cost of an increased variance, leads to a slight 
increase in the average squared error of this esti- 
mator. That is, even though the bias is reduced, 
the average squared difference between the estima- 
tor and the true value has increased. For the other 
two estimators, this is not the case - the increase in 
variance does not offset the bias-correction. 

In Sec. 2] Brier Score decomposition has been ap- 
plied to autoregressive forecasts of exceedance events 
of temperature anomalies. The Brier Score decom- 
position was applied to 10 years' worth of daily 
data. The two-standard-deviation error bars of all 
estimators are relatively wide, considering that the 
decomposition is based on more than 3000 data 
points. In evaluation studies of weather forecasts, 
usually much less data is available and the variabil- 
ity of the estimators must be expected to be higher 
in these cases. Reliable estimates of the variability 
of the components of the Brier Score decomposi- 
tion are required for an honest assessment of the 
siginificance of the results. 



6 Summary and conclusions 

The components of the Brier Score decomposition 
can be used to assess the forecast attributes relia- 
bility and resolution, as well as the inherent uncer- 
tainty of the underlying process. The decomposi- 
tion thus provides insight that goes beyond quan- 
tifying the performance by calculating the average 
Brier Score. We have derived variance estimates 
for the traditional and bias-corrected estimators of 
the components of Brier Score decomposition. The 
variances are approximated by propagation of un- 
certainty. The validity of the variance estimates 
was illustrated using artificial data, where the true 
values of the components are known. An actual 
meteorological forecast setting illustrated a possible 
application. A discussion was provided about the 
implied assumptions, as well as the consequences of 
bias-correction. 

We conclude that, in the cases considered, the vari- 
ance estimates provide meaningful approximations 
as to the statistical variability of the components 
of Brier Score decomposition. Confidence inter- 
vals have reasonable coverage probabilities, and es- 
timated and empirical variances coincide, despite 
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numerous simplifying assumptions. Furthermore, 
we note that bias-correction comes at the cost of 
an increased estimator variance. An example was 
shown where the bias-correction was not able to 
decrease the average squared difference of the esti- 
mator from its true value. 

Forecasters who want to compare competing prob- 
abilistic forecasting schemes based on finite data 
will certainly find the competing Brier Score com- 
ponents to be different due to statistical fluctua- 
tions alone. Using the variance estimates proposed 
here, the magnitude of these statistical fluctuations 
can be quantified approximately. This makes pos- 
sible a more realistic assessment of the significance 
of the observed differences, and therefore a more 
robust comparison in terms of true predictive skill. 



A. 2 RES 
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1 (B. d 
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dA. d 
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\A. d ' 
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ORES 
dB. d 
<9RES 
dY. 



'B 1 d_Y L 
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A Appendix: Derivatives 

Note that some of the following derivatives can be 
undefined due to vanishing denominators. These 
derivatives must be set to zero. 



<9REL' 
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<9REL' 
dB. d 

<9REL' 
dC. d 



NA 2 



{B. d — C. c 
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2B. d - 1 2C. d 



N(A. d - 1) NA. d 
2(B. d — C. d ) 
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NA.d 



A.l REL 
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dC. d 
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NAld 
2{B. d — C. d ) 
NA7 d 
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NA.d 
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dA. d 
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B. d 



NAl d {A. d -lf [(A ' d B ' df B ' d[B ' 



<9RES' 2 (B. d Y.\ A. d -2B. d 



dB. d N \A. d NJ NA.d(A.d — 1) 

<9RES' _ N -2Y. 
dY. ~ N 3 {N - 1) 



A. 6 UNC' 



<9UNC' N - 2Y. 



dY. 



N(N - 1) 
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B Appendix: Avoiding incon- 
sistencies due to the bias cor- 
rection 



The b ias-correction proposed by iFerro and Fricker 
can be imagined as shifting the 3- vector d = 
(REL, RES, UNC) to a new point 

d' = (REL',RES',UNC') = d + c (52) 

along a plane of constant Brier Score. Let the vari- 
ables S and T be denned by REL' = REL - S (cf. 
Eq. JT5JI) and UNC' = UNC + T (cf. Eq. IpT])). 
Denote by A = [0, 1] x [0, 1] x [0, 0.25] the space 
of 'allowed' Brier Score decompositions. In order 
to avoid inconsistencies due to d' ^ A, a possible 
modification is to use the bias-correction 

d" = (REL", RES", UNC") = d + 7 c, (53) 

where 7 is given by 



7 = mm 



REL 

— — ,max 

1 - 4UNC 
4T 



RES RES - 1 



S-T' S-T 



(54) 



The parameter 7 is confined to the unit interval, 
and ensures that neither REL" < nor RES" < 
nor RES" > 1 nor UNC" > 1/4. Essentially 7 en- 
sures that the decomposition d is shifted linearly 
as far as possible to the bias-corrected decomposi- 
tion d', but not too far as to carrying any of the 
components out of their allowed range. 
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